Element wise multiplication

In R, when a vector (or column) is multiplied (*) by a constant each element in the vector is multiplied by that constant.

Below is a simple example using vectors…

# element wise multiplication - vector

x <- c(2, 4, 6, 8)
x
## [1] 2 4 6 8
2*x
## [1]  4  8 12 16

This is also true in a tibble (or dataset) using tidyverse functions.

# element wise multiplication - tibble

df <- tibble(x)
df
df <- df |>
  mutate(y=x*2)

df

We can see that the element wise property works for a constant, it also works for vector to vector multiplication.

For example,

x
## [1] 2 4 6 8
y <- c(1,1,2,2)
y
## [1] 1 1 2 2
x*y
## [1]  2  4 12 16

Each element of the x vector is multiplied by the same element in the y vector.

What happens if the vectors are not the same size?

x
## [1] 2 4 6 8
y2 <- c(1,1,2,2,3)

x*y2
## Warning in x * y2: longer object length is not a multiple of shorter object
## length
## [1]  2  4 12 16  6

Comparisons

Logical vectors are vectors with all elements being TRUE or FALSE. You do not often see logical variables in a data frame but they are often created and used in the intermediate steps of transformation.

One common way to create a logical vector is using comparisons (<, >, <=, >=, !=, ==)

# using comparisons

x
## [1] 2 4 6 8
y3 <- x <= 4
y3
## [1]  TRUE  TRUE FALSE FALSE

This of course works for columns in a dataframe as well…

df
df <- df |>
  mutate(z = x <= 4)
df

This column z shows you the logic behind what happens when we filter. If we use the same comparison in filter() rather than mutate() then the elements that were TRUE remain and the elements that are FALSE are “filtered” out.

df
df |>
  filter(x <= 4)

is.na()

Missing values are simply unknown values in the data.

When missing values are used in comparisons the result will likely be NA.

The examples from the text are

NA>5
## [1] NA
10 == NA
## [1] NA

It may be confusing that comparing to see if NA is equal to NA ia also NA.

NA == NA
## [1] NA

This is best illustrated with an example from the text1.

# We don't know how old Mary is
age_mary <- NA

# We don't know how old John is
age_john <- NA

# Are Mary and John the same age?
age_mary == age_john
## [1] NA

We don’t know!

If you want to find all the missing values in a dataframe (for example the flights dataframe) column (like dep_time) you can not use the following code1….

flights |>
  filter(dep_time == NA)

As this will return a vector with no elements. There are not elements where dep_time == NA would return a TRUE.

Instead we use the function is.na()

flights |>
  filter(is.na(dep_time))

Boolean algebra

Boolean algebra are the operations that we typically perform on logical values. For R, these operations are the following….

Your text1 provides a great visual

Therefore, in contrast to our code before. IF we wanted to eliminate all NAs, one way to do that would be …

#Filter (display) all values that are NOT NA

df |> 
  filter(!is.na(x))

Order of Operations with Boolean Algebra

The order of operations for Boolean doesn’t always read like an English sentence.

For example, using our flights data, if we wanted to filter out all obervations where the month is November OR December.

The code is not …

flights |>
  filter(month == 11 | 12)

You can see that this returns all values of month. Why? Your text provides a good explanation.

The code we should be using is…

flights |> 
  filter(month == 11 | month == 12)

Now, R knows what the 12 is referring to and supplies the wanted observations.

You may recall from Module A2 that I introduced a function %in%. Remember this function as you may find it useful when trying to obtain a desired subset of your dataframe.

For example, if you wanted to filter flights in October, November, and December, you could write the code…

flights |>
  filter(month==10 | month==11 | month == 12)

OR you could use %in% and the combine function c()

flights |>
  filter(month %in% c(10, 11, 12))

Let’s you try it…

Conditional Transformations

There are two functions that are helpful when delaing with conditional transformations, that is to say transformations based on some condition.

The first is the if_else() function.

The first argument of the if_else function is the condition, the 2nd argument is the result if the condition is TRUE, the 3rd is the result if the condition is FALSE.

For example if we wanted to create a variable based on the value of another variable

x2 <- c(10, 20, 30, 40, 50)
x2
## [1] 10 20 30 40 50
y2 <- if_else(x2 > 35, "GREATER", "LESS")
y2
## [1] "LESS"    "LESS"    "LESS"    "GREATER" "GREATER"

If you have more than one category to change then you may want to use the case_when() function (from dplyr)

x2 <- c(10, 20, 30, 40, 50)
x2
## [1] 10 20 30 40 50
y2 <- case_when(
  x2 < 30   ~ "LESS",
  x2 == 30  ~ "EQUAL",
  x2 > 30   ~ "GREATER"
)
y2
## [1] "LESS"    "LESS"    "EQUAL"   "GREATER" "GREATER"

These functions will work with categorical variables as well.

x3 <- c("Freshman", "Sophomore", "Junior", "Senior", "Junior")
x3 
## [1] "Freshman"  "Sophomore" "Junior"    "Senior"    "Junior"
y2 <- if_else(x3 == "Junior", "Year 3", x3)
y2
## [1] "Freshman"  "Sophomore" "Year 3"    "Senior"    "Year 3"
y3 <- case_when(
  x3 == "Freshman" ~ "Year 1",
  x3 == "Sophomore" ~ "Year 2",
  x3 == "Junior" ~ "Year 3",
  x3 == "Senior" ~ "Year 4",
  TRUE ~ x3
)
y3
## [1] "Year 1" "Year 2" "Year 3" "Year 4" "Year 3"

Let’s you try it…

Resources

  1. (R for Data Scienc, 2nd edition)[https://r4ds.hadley.nz/logicals]